MotifOrganizer: a scalable model-based motif clustering tool for mammalian genomes.
نویسندگان
چکیده
Assembling a comprehensive catalog of all transcription factors (TFs) and the genes that they regulate (regulon) is important for understanding gene regulation. The sequence-specific conserved binding profiles of TFs can be characterized from whole genome sequences with phylogenetic approaches, and a large number of such profiles have been released. Effective mining of these data sources could reveal novel functional elements computationally. Due to the variability of the binding sites, it is necessary to generalize profiles pertinent to the same TF by clustering. The summarized familial profile is effective in identifying unknown binding sites, thus lead to gene co-regulation prediction. Here we report MotifOrganizer, a scalable model-based clustering algorithm designed for grouping motifs identified from large scale comparative genomics studies on mammalian species. The new algorithm allows grouping of motifs with variable widths and a novel two-stage operation scheme further increases the scalability. MotifOrgainzer demonstrated favorable performance comparing to distance-based and single-stage model-based clustering tools on simulated data. Tests on approximately 150k motifs from the cisRED human database demonstrated that MotifOrganizer can effectively cluster whole genome sets of mammalian motifs.
منابع مشابه
Fuzzy Clustering of Stochastic Models for Molecular Phylogenetics
A new method for the study of molecular phylogenetics based on fuzzy c-means clustering of Markov models is proposed. This approach is able to cluster whole sequences or genomes into groups whose boundaries overlap, and to reconstruct the phylogenetic trees that graphically describe the evolutionary relationships between organisms. The method is applied to examine the similarities and evolution...
متن کاملHierarchical sets: analyzing pangenome structure through scalable set visualizations
Motivation The increase in available microbial genome sequences has resulted in an increase in the size of the pangenomes being analyzed. Current pangenome visualizations are not intended for the pangenome sizes possible today and new approaches are necessary in order to convert the increase in available information to increase in knowledge. As the pangenome data structure is essentially a coll...
متن کاملA New WordNet Enriched Content-Collaborative Recommender System
The recommender systems are models that are to predict the potential interests of users among a number of items. These systems are widespread and they have many applications in real-world. These systems are generally based on one of two structural types: collaborative filtering and content filtering. There are some systems which are based on both of them. These systems are named hybrid recommen...
متن کاملDynamic configuration and collaborative scheduling in supply chains based on scalable multi-agent architecture
Due to diversified and frequently changing demands from customers, technological advances and global competition, manufacturers rely on collaboration with their business partners to share costs, risks and expertise. How to take advantage of advancement of technologies to effectively support operations and create competitive advantage is critical for manufacturers to survive. To respond to these...
متن کاملTarget Tracking Based on Virtual Grid in Wireless Sensor Networks
One of the most important and typical application of wireless sensor networks (WSNs) is target tracking. Although target tracking, can provide benefits for large-scale WSNs and organize them into clusters but tracking a moving target in cluster-based WSNs suffers a boundary problem. The main goal of this paper was to introduce an efficient and novel mobility management protocol namely Target Tr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Frontiers in bioscience
دوره 5 شماره
صفحات -
تاریخ انتشار 2013